Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved concurrency for multi-threaded indexing #236

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

nigels-com
Copy link

Concurrent tree building is faster on my 12 core laptop for ~130M point clouds, in this arrangement.

The intuition is to avoid having threads wait for both the left and right subtrees if they are both asynchronous.
In this arrangment we first recurse to left on the same thread (depth-first to the left) with an async recursion of the right subtree concurrently.
Once the left is all visited we'll recurse to the right with the current thread.

What this is aiming to avoid is having too many threads blocked (doing no work) waiting for async processing to complete on both left and right.

I see around a 2x speedup for 12 cores and ~130M, around 30 seconds rather than 60 seconds.

@jlblancoc
Copy link
Owner

Great contribution! Let me check this against valgrind and do some minimal benchmarking before merging.

@nigels-com
Copy link
Author

Happy indeed for a second opinion on this.

@jlblancoc
Copy link
Owner

I updated nanoflann benchmark to make some quick benchmarking with this PR. The results are good but, as usual, the optimal approach depends on the data distribution and dataset size:

Build index profiling for the KITTI 04 sequence pointclouds:

Method 1 thread 4 threads 8 threads auto # threads (=0)
Baseline 8.993637 ms 6.922916 ms 6.759240 ms 7.775914 ms
This PR 8.999447 ms 6.235389 ms 7.164032 ms 8.716205 ms

Some graphs:

For 4 threads vs 1 thread

Screenshot from 2024-03-12 00-04-33

For 8 threads vs 1 thread

Screenshot from 2024-03-12 00-03-34

The advantage of this PR is clearer for 4 threads than for 8 threads... for this particular pointcloud.

Anyway, I feel positive about the logic of the change, and valgrind (including helgrind) is happy, so I'm merging it...
Thanks!

@jlblancoc jlblancoc merged commit 0e43d8b into jlblancoc:master Mar 11, 2024
6 checks passed
@nigels-com
Copy link
Author

nigels-com commented Mar 11, 2024

Interesting results. Modest speedup for these (small?) point clouds.
Our typical workload is 100M to 1B points.

@jlblancoc
Copy link
Owner

Yes, they are small in comparison. The horizontal axis is number of points...

But the trend is clear in that multi threading is better for larger clouds.

@nigels-com
Copy link
Author

I'm a bit intrigued about four threads being faster than eight threads.
I'd like to dig into that a little more.

@jlblancoc
Copy link
Owner

@nigels-com The exact place to force a fixed number of threads in our benchmark is here:
/~https://github.com/MRPT/nanoflann-benchmark/blob/master/benchmarkTool/realTests/benchmark_nanoflann_real.cpp#L169-L177

feel free of using that in your project/dataset just to see what happens...

My guess is that results may depend a lot depending on how "random" the distribution of consecutive points are. In 3D LIDARs (as KITTI), point coordinates have a strong correlation with their neighbors in the sequence.

@nigels-com
Copy link
Author

Thanks for the details.

Our points are probably less coherent because the Hovermap lidar is both spinning and moving through space.
We do see better performance sorting them upstream of nanoflann.
So my number and graphs are likely to be quite different to this benchmark.

https://emesent.com/hovermap-series/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants